44 research outputs found

    Sparse Exploratory Factor Analysis

    Get PDF
    Sparse principal component analysis is a very active research area in the last decade. It produces component loadings with many zero entries which facilitates their interpretation and helps avoid redundant variables. The classic factor analysis is another popular dimension reduction technique which shares similar interpretation problems and could greatly benefit from sparse solutions. Unfortunately, there are very few works considering sparse versions of the classic factor analysis. Our goal is to contribute further in this direction. We revisit the most popular procedures for exploratory factor analysis, maximum likelihood and least squares. Sparse factor loadings are obtained for them by, first, adopting a special reparameterization and, second, by introducing additional [Formula: see text]-norm penalties into the standard factor analysis problems. As a result, we propose sparse versions of the major factor analysis procedures. We illustrate the developed algorithms on well-known psychometric problems. Our sparse solutions are critically compared to ones obtained by other existing methods

    Archetypal Analysis: Mining Weather and Climate Extremes

    Get PDF
    Conventional analysis methods in weather and climate science (e.g., EOF analysis) exhibit a number of drawbacks including scaling and mixing. These methods focus mostly on the bulk of the probability distribution of the system in state space and overlook its tail. This paper explores a different method, the archetypal analysis (AA), which focuses precisely on the extremes. AA seeks to approximate the convex hull of the data in state space by finding “corners” that represent “pure” types or archetypes through computing mixture weight matrices. The method is quite new in climate science, although it has been around for about two decades in pattern recognition. It encompasses, in particular, the virtues of EOFs and clustering. The method is presented along with a new manifold-based optimization algorithm that optimizes for the weights simultaneously, unlike the conventional multistep algorithm based on the alternating constrained least squares. The paper discusses the numerical solution and then applies it to the monthly sea surface temperature (SST) from HadISST and to the Asian summer monsoon (ASM) using sea level pressure (SLP) from ERA-40 over the Asian monsoon region. The application to SST reveals, in particular, three archetypes, namely, El Niño, La Niña, and a third pattern representing the western boundary currents. The latter archetype shows a particular trend in the last few decades. The application to the ASM SLP anomalies yields archetypes that are consistent with the ASM regimes found in the literature. Merits and weaknesses of the method along with possible future development are also discussed

    Classification in sparse, high dimensional environments applied to distributed systems failure prediction

    Get PDF
    Network failures are still one of the main causes of distributed systems’ lack of reliability. To overcome this problem we present an improvement over a failure prediction system, based on Elastic Net Logistic Regression and the application of rare events prediction techniques, able to work with sparse, high dimensional datasets. Specifically, we prove its stability, fine tune its hyperparameter and improve its industrial utility by showing that, with a slight change in dataset creation, it can also predict the location of a failure, a key asset when trying to take a proactive approach to failure management

    Semi-sparse PCA

    Get PDF
    It is well-known that the classical exploratory factor analysis (EFA) of data with more observations than variables has several types of indeterminacy. We study the factor indeterminacy and show some new aspects of this problem by considering EFA as a specific data matrix decomposition. We adopt a new approach to the EFA estimation and achieve a new characterization of the factor indeterminacy problem. A new alternative model is proposed, which gives determinate factors and can be seen as a semi-sparse principal component analysis (PCA). An alternating algorithm is developed, where in each step a Procrustes problem is solved. It is demonstrated that the new model/algorithm can act as a specific sparse PCA and as a low-rank-plus-sparse matrix decomposition. Numerical examples with several large data sets illustrate the versatility of the new model, and the performance and behaviour of its algorithmic implementation

    Recipes for sparse LDA of horizontal data

    Get PDF
    Many important modern applications require analyzing data with more variables than observations, called for short horizontal. In such situation the classical Fisher’s linear discriminant analysis (LDA) does not possess solution because the within-group scatter matrix is singular. Moreover, the number of the variables is usually huge and the classical type of solutions (discriminant functions) are difficult to interpret as they involve all available variables. Nowadays, the aim is to develop fast and reliable algorithms for sparse LDA of horizontal data. The resulting discriminant functions depend on very few original variables, which facilitates their interpretation. The main theoretical and numerical challenge is how to cope with the singularity of the within-group scatter matrix. This work aims at classifying the existing approaches according to the way they tackle this singularity issue, and suggest new ones

    Simultaneous model-based clustering and visualization in the Fisher discriminative subspace

    Full text link
    Clustering in high-dimensional spaces is nowadays a recurrent problem in many scientific domains but remains a difficult task from both the clustering accuracy and the result understanding points of view. This paper presents a discriminative latent mixture (DLM) model which fits the data in a latent orthonormal discriminative subspace with an intrinsic dimension lower than the dimension of the original space. By constraining model parameters within and between groups, a family of 12 parsimonious DLM models is exhibited which allows to fit onto various situations. An estimation algorithm, called the Fisher-EM algorithm, is also proposed for estimating both the mixture parameters and the discriminative subspace. Experiments on simulated and real datasets show that the proposed approach performs better than existing clustering methods while providing a useful representation of the clustered data. The method is as well applied to the clustering of mass spectrometry data

    Sparse PCA for compositional data

    No full text
    A great number of procedures for sparse principal component analysis (PCA) were proposed in the last decade. However, they cannot be applied directly for PCA of compositional data (CoDa). We introduce a new procedure for sparse PCA which takes into account the additional constraints specific for CoDa. The proposed method is very effective to find logcontrasts in data, which is illustrated on a real example
    corecore